%
%   This program is free software: you can redistribute it and/or modify
%   it under the terms of the GNU General Public License as published by
%   the Free Software Foundation, either version 3 of the License, or
%   (at your option) any later version.
%
%   This program is distributed in the hope that it will be useful,
%   but WITHOUT ANY WARRANTY; without even the implied warranty of
%   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
%   GNU General Public License for more details.
%
%   You should have received a copy of the GNU General Public License
%   along with this program.  If not, see <http://www.gnu.org/licenses/>.
%

% Version: $Revision: 6196 $

The previous chapter described how to extend Weka to add your own
learning algorithms and various enhancements to the user
interfaces. This chapter describes how such enhancements can be
assembled into a ``package'' that can be accessed via Weka's package
management system. Bundling your enhancements in a package makes it
easy to share with other Weka users.

In this chapter we refer to a ``package'' as an archive containing
various resources such as compiled code, source code, javadocs,
package description files (meta data), third-party libraries and
configuration property files. Not all of the preceding may be in a
given package, and there may be other resources included as well. This
concept of a ``package'' is quite different to that of a Java packages, which
simply define how classes are arranged hierarchically.

\section{Where does Weka store packages and other configuration stuff?}

By default, Weka stores packages and other information in
\verb=$WEKA_HOME=. The default location for \verb=$WEKA_HOME= is
\verb=user.home/wekafiles=, where \verb=user.home= is the user's home
directory. You can change the default location for \verb=WEKA_HOME= by
setting this either as an evironment variable for your platform, or by
specifying it as a Java property when starting Weka. E.g.:

{\scriptsize
\begin{verbatim}
  export WEKA_HOME=/home/somewhere/weka_bits_and_bobs
\end{verbatim}}

will set the directory that Weka uses to
\verb=/home/somewhere/weka_bits_and_bobs= under the
LINUX operating system.

The same thing can be accomplished when starting Weka by
specifying a Java property on the command line, E.g.:

{\scriptsize
\begin{verbatim}
  java -DWEKA_HOME=/home/somewhere/weka_bits_and_bobs -jar weka.jar
\end{verbatim}}

Inside \verb=$WEKA_HOME= you will find the main weka log file (weka.log)
and a number of directories:

\begin{itemize}
\item \textbf{packages} holds installed packages. Each package is
  contained its own subdirectory.
\item \textbf{props} holds various Java property files used by
  Weka. This directory replaces the user's home directory (used in
  earlier releases of Weka) as one of the locations checked by Weka
  for properties files (such as DatabaseUtils.props). Weka first
  checks, in order, the current directory (i.e. the directory that
  Weka is launched from), then \verb=$WEKA_HOME/props= and finally the
  weka.jar file for property files.
\item \textbf{repCache} holds the cached copy of the meta data from
  the central package repository. If the contents of this directory
  get corrupted it can be safely deleted and Weka will re-create it on
  the next restart.
\item \textbf{systemDialogs} holds marker files that are created when
  you check ``Don't show this again'' in various system popup
  dialogs. Removing this directory or its contents will cause Weka to
  display those prompts anew.
\end{itemize}

\section{Anatomy of a package}

A Weka package is a zip archive that must unpack to the current
directory. For example, the DTNB package contains the decision table
naive Bayes hybrid classifier and is delivered in a file called
DTNB.zip. When unpacked this zip file creates the following directory
structure:

{\scriptsize
\begin{verbatim}
  <current directory>
    +-DTNB.jar
    +-Description.props
    +-build_package.xml
    +-src
    |   +-main
    |   |   +-java
    |   |       +-weka
    |   |           +-classifiers
    |   |               +-rules
    |   |                   +-DTNB.java
    |   +-test
    |       +-java
    |            +-weka
    |               +-classifiers
    |                   +-rules
    |                       +-DTNBTest.java
    +-lib
    +-doc
\end{verbatim}}

When installing, the package manager will use the value of the
``PackageName'' field in the \texttt{Description.props} file (see below) to
create a directory in \verb=$WEKA_HOME/packages= to hold the package
contents. The contents of the \texttt{doc} directory have not been
shown in the above diagram, but this directory contains javadoc for
the DTNB class. A package \textbf{must} have a
\texttt{Description.props} file and contain at least one \texttt{jar}
file with compiled java classes.  The package manager will attempt to
load all \texttt{jar} files that it finds in the root directory and
the \texttt{lib} directory. Other files are optional, but if the
package is open-source then it is nice to include the source code and
an ant build file that can be used to compile the code. Template
versions of the \texttt{Description.props} file and
\verb=build_package.xml= file are available from the Weka site and
from the Weka wiki.

\subsection{The description file}

A valid package must contain a \texttt{Description.props} file that
provides meta data on the package. Identical files are stored at the
central package repository and the local cache maintained by the
package manager. The package manager uses these files to compare
what is installed to what is available and resolve dependencies.

The \texttt{Description.props} contains basic information on the
package in the following format:

{\scriptsize
\begin{verbatim}
# Template Description file for a Weka package

# Package name (required)
PackageName=funkyPackage

# Version (required)
Version=1.0.0

#Date (year-month-day)
Date=2010-01-01

# Title (required)
Title=My cool algorithm

# Category (recommended)
Category=Classification

# Author (required)
Author=Joe Dev <joe@somewhere.net>,Dev2 <dev2@somewhereelse.net>

# Maintainer (required)
Maintainer=Joe Dev <joe@somewhere.net>

# License (required)
License=GPL 2.0|Mozilla

# Description (required)
Description=This package contains the famous Funky Classifer that performs \
 truely funky prediction.

# Package URL for obtaining the package archive (required)
PackageURL=http://somewhere.net/weka/funkyPackage.zip

# URL for further information
URL=http://somewhere.net/funkyResearchInfo.html

# Enhances various other packages ?
Enhances=packageName1,packageName2,...

# Related to other packages?
Related=packageName1,packageName2,...

# Dependencies (format: packageName (equality/inequality version_number)
Depends=weka (>=3.7.1), packageName1 (=x.y.z), packageName2 (>u.v.w|<=x.y.z),...
\end{verbatim}}

Lines that begin with \verb=#= are comments. The
\texttt{``PackageName''}, \texttt{``Version''}, \texttt{``Title''},
\texttt{``Author''}, \texttt{``Maintainer''}, \texttt{``License''},
\texttt{``Description''} and \texttt{``PackageURL''} fields are
mandatory, the others are optional.

The \texttt{``PackageName''} and \texttt{``Version''} give the name
of the package and version number respectively. The name can consist of
letters, numbers, and the dot character. It should not start with a dot
and should not contain any spaces. The version number is a sequence of
three non-negative integers separated by single ``.'' or ``-'' characters.

The \texttt{``Title''} field should give a one sentence description of
the package. The \texttt{``Description''} field can give a longer
description of the package spaning multiple sentences. It may include
technical references and can use HTML markup.

The \texttt{``Category''} field is strongly recommended as this
information is displayed on both the repository web site and in the
GUI package manager client. In the latter, the user can sort the
packages on the basis of the category field. It is recommended that an
existing category be assigned if possible. Some examples include
(Classification, Text classification, Ensemble learning, Regression,
Clustering, Associations, Preprocessing, Visualization, Explorer,
Experimenter, KnowledgeFlow).

The \texttt{``Author''} field describes who wrote the package and may
include multiple names (separated by commas). Email addresses may be
given in angle brackets after each name. The field is intended for
human readers and no email addresses are automatically extracted.

The \texttt{``Maintainer''} field lists who maintains the package and
should include a single email address, enclosed in angle brackets, for
sending bug reports to.

The \texttt{``License''} field lists the license(s) that apply to the
package. This field may contain the short specification of a license
(such as LGPL, GPL 2.0 etc.) or the string ``file LICENSE'', where
``LICENSE'' exists as a file in the top-level directory of the
package. The string ``Unlimited'' may be supplied to indicate that
there are no restrictions on distribution or use aside from those
imposed by relevant laws.

The \texttt{``PackageURL''} field lists valid URL that points to the
package zip file. This URL is used by the package manager to download
and install the package.

The optional \texttt{``Depends''} field gives a comma separated list
of packages which this package depends on. The name of a package is
optionally followed by a version number constraint enclosed in
parenthesis. Valid operators for version number constraints include
$=, <, >, <=, >=$. The keyword ``weka'' is reserved to refer to the
base Weka system and can be used to indicate a dependency on a 
particular version of Weka. For example:

\begin{verbatim}
  Depends=weka (>=3.7.2), DTNB (=1.0.0)
\end{verbatim}

states that this package requires Weka 3.7.2 or higher and version
1.0.0 of the package DTNB.

\begin{verbatim}
  Depends=weka (>3.7.1|<3.8.0)
\end{verbatim}

states that this package requires a version of Weka between 3.7.1 and
3.8.0.

\begin{verbatim}
  Depends=DTNB (<1.5.0|>=2.0.1)
\end{verbatim}

states that this package requires that a version of the DTNB package
be installed that is either less than version 1.5.0 \textit{or}
greater than or equal to version 2.0.1.

If there is no version number constraint following a package name, the
package manager assumes that the latest version of the dependent
package is suitable.

The optional \texttt{``URL''} field gives a URL at which the user can
find additional online information about the package or its constituent
algorithms.

The optional \texttt{``Enhances''} field can be used to indicate which
other packages this package is based on (i.e. if it extends
methods/algorithms from another package in some fashion).

The optional \texttt{``Related''} field is similar to the
\texttt{``Enhances''} field. It can be used to point the user to other
packages that are related in some fashion to this one.

There are several other fields that can be used to provide information
to assist the user with completing installation (if it can't be
completely accomplished with the package zip file) or display error
messages if necessary components are missing:

{\scriptsize
\begin{verbatim}
MessageToDisplayOnInstall=Funky package requires some extra\n\
 stuff to be installed after installing this package. You will\n\
 need to blah, blah, blah in order to blah, blah, blah...

DoNotLoadIfFileNotPresent=lib/someLibrary.jar,otherStuff/important,...

DoNotLoadIfFileNotPresentMessage=funkyPackage can't be loaded because some \
 funky libraries are missing. Please download funkyLibrary.jar from \
 http://www.funky.com and install in $WEKA_HOME/packages/funkyPackage/lib

DoNotLoadIfClassNotPresent=com.some.class.from.some.Where,org.some.other.Class,...

DoNotLoadIfClassNotPresentMessage=funkyPackage can't be loaded because \
 com.funky.FunkyClass can't be instantiated. Have you downloaded and run \
 the funky software installer for your platform?
\end{verbatim}}

The optional \texttt{``MessageToDisplayOnInstall''} field allows you
to specify special instructions to the user in order to help them
complete the intallation manually.  This message gets displayed on the
console, written to the log and appears in a pop-up information dialog
if using the GUI package manager. It should include ``\verb=\n='' in
order to avoid long lines when displayed in a GUI pop-up dialog.

The optional \texttt{``DoNotLoadIfFileNotPresent''} field can be used
to prevent Weka from loading the package if the named \textit{files}
and/or \textit{directories} are not present in the package's
installation directory. An example is the massiveOnlineAnalysis
package. This package is a connector only package and does not include
the MOA library. Users of this package must download the moa.jar file
separately and copy it to the package's \texttt{lib} directory
manually. Multiple files and directories can be specified as a comma
separated list. All paths are relative to the top-level directory of
the package. \textbf{IMPORTANT}: use forward slashes as separator
characters, as these are portable accross all platforms. The
\texttt{``DoNotLoadIfFileNotPresentMessage''} field can be used to
supply an optional message to display to the user if Weka detects that
a file or directory is missing from the package. This message will be
displayed on the console and in the log.

The optional \texttt{``DoNotLoadIfClassNotPresent''} field can be used
to prevent Weka from loading the package if the named
\textit{class(es)} can't be instantiated. This is useful for packages
that rely on stuff that has to be installed manually by the user. For
example, Java3D is a separate download on all platforms except for
OSX, and installs itself into the system JRE/JDK. The
\texttt{``DoNotLoadIfClassNotPresentMessage''} field can be used to
supply an optional message to display to the user if Weka detects that
a class can't be instantiated. Again, this will be displayed on the
console and in the log.

\subsection{Additional configuration files}

Certain types of packages may require additional configuration files
to be present as part of the package. The last chapter covered various
ways in which Weka can be extended without having to alter the core
Weka code. These plugin mechanisms have been subsumed by the package
management system, so some of the configuration property files they
require must be present in the package's top-level directory if the
package in question contains such a plugin. Examples include
additional tabs for the Explorer, mappings to custom property editors
for Weka's GenericObjectEditor and Knowledge Flow plugins. Here are
some examples:

The scatterPlot3D package adds a new tab to the Explorer. In order to
accomplish this a property has to be set in the Explorer.props file
(which contains default values for the Explorer) in order to tell Weka
to instantiate and display the new panel. The scatterPlot3D file includes
an ``Explorer.props'' file in its top-level directory that has the following
contents:

\begin{verbatim}
# Explorer.props file. Adds the Explorer3DPanel to the Tabs key.
Tabs=weka.gui.explorer.Explorer3DPanel
TabsPolicy=append
\end{verbatim}

This property file is read by the package management system when the
package is loaded and any key-value pairs are added to existing
Explorer properties that have been loaded by the system at startup. If
the key already exists in the Explorer properties, then the package
has the option to either replace (i.e. overwrite) or append to the
existing value. This can be specified with the \texttt{TabsPolicy}
key. In this case, the value
\texttt{weka.gui.explorer.Explorer3DPanel} is appended to any existing
value associated with the \texttt{Tabs} key. \texttt{Explorer3DPanel}
gets instantiated and added as a new tab when the Explorer starts.

Another example is the kfGroovy package. This package adds a plugin
component to Weka's Knowledge Flow that allows a Knowledge Flow step
to be implemented and compiled dynamically at runtime as a Groovy
script. In order for the Knowledge Flow to make the new step appear
in its ``Plugins'' toolbar, there needs to be a''Beans.props'' file
in the package's top-level directory. In the case of kfGroovy, this
property file has the following contents:

\begin{verbatim}
# Specifies that this component goes into the Plugins toolbar
weka.gui.beans.KnowledgeFlow.Plugins=org.pentaho.dm.kf.GroovyComponent
\end{verbatim}

More information on Knowledge Flow plugins is given in Section 7.5.

\section{Contributing a package}

If you have created a package for Weka then there are two
options for making it available to the community. In both cases,
hosting the package's zip archive is the responsibility of
the contributer.

The first, and official, route is to contact the current Weka
maintainer (normally also the admin of the Weka homepage) and supply
your package's \texttt{Description.props} file. The Weka team will
then test downloading and using your package to make sure that there
are no obvious problems with what has been specified in the
\texttt{Description.props} file and that the software runs and does
not contain any malware/malicious code. If all is well, then the
package will become an ``official'' Weka package and the central
package repository meta data will be updated with the package's
\texttt{Description.props} file. \textit{Responsibility for
  maintaining and supporting the package resides with the
  contributer}.

The second, and unofficial, route is to simply make the package's zip
archive available on the web somewhere and advertise it
yourself. Although users will not be able to browse it's description
in the official package repository, they will be able to download and
install it directly from your URL by using the command line version of
the package manager. This route could be attractive for people who
have published a new algorithm and want to quiclky make a beta version
available for others to try without having to go through the official
route.

\section{Creating a mirror of the package meta data repository}

In this section we discuss an easy approach to setting up and
maintaining a mirror of the package meta data repository. Having a
local mirror may provide faster access times than to that of the
official repository on Sourceforge. Extending this approach to the
creation of an alternative central repository (hosting packages not
available at the official repository) should be straight forward.

Just about everything necessary for creating a mirror exists in
the local meta data cache created by Weka's package management
system. This cache resides at \verb=$WEKA_HOME/repCache=. The only
thing missing (in Weka 3.7.2) for a complete mirror is the
file ``images.txt'', that lists all the image files used in the html
index files. This file contains the following two lines:

\begin{verbatim}
  Title-Bird-Header.gif
  pentaho_logo_rgb_sm.png
\end{verbatim}

``images.txt'' is downloaded automatically by the package management
system in Weka 3.7.3 and higher.

To create a mirror:

\begin{enumerate}
\item Copy the contents of \verb=$WEKA_HOME/repCache= to a temporary
  directory. For the purposes of this example we'll call it
  \texttt{tempRep}
\item Change directory into \texttt{tempRep} and run\\\\

\verb=java weka.core.RepositoryIndexGenerator .= \\

Don't forget the ``.'' after the command (this tells
\texttt{RepoistoryIndexGenerator} to operate on the current directory)

\item Change directory to the parent of \texttt{tempRep} and synchronize its
contents to wherever your web server is located (this is easy via rsync under
Nix-like operating systems).
\end{enumerate}

\texttt{RepositoryIndexGenerator} automatically creates the main
index.html file, all the package index.html files and html files
correpsonding to all version prop files for each package. It will also
create \texttt{packageList.txt} and \texttt{numPackages.txt} files.

\textbf{IMPORTANT}: Make sure that all the files in \texttt{tempRep}
are world readable.

It is easy to make packages available that are not part of the
official Weka repository. Assuming you want to add a package called
``funkyPackage'' (as specified by the
``PackageName'' field in the \texttt{Description.props} file):

\begin{enumerate}
\item Create a directory called ``funkyPackage'' in \texttt{tempRep}
\item Copy the \texttt{Description.props} file to \texttt{tempRep/funkyPackage/Latest.props}
\item Copy the \texttt{Description.props} file to \texttt{tempRep/funkyPackage/<version number>.props}, where ``version number'' is the version number specified in the ``Version'' field of \texttt{Description.props}
\item Run \texttt{RepositoryIndexGenerator} as described previously and sync \texttt{tempRep} to your web server
\end{enumerate}

Adding a new version of an existing package is very similar to what
has already been described. All that is required is that the new
\texttt{Description.props} file corresponding to the new version is
copied to \texttt{Latest.props} and to \texttt{<version numer>.props}
in the package's folder. Running \texttt{RepositoryIndexGenerator}
will ensure that all necessary html files are created and supporting
text files are updated.

Automating the mirroring process would simply involve using your OS's
scheduler to execute a script that:

\begin{enumerate}
\item Runs \texttt{weka.core.WekaPackageManager -refresh-cache}
\item rsyncs \verb=$WEKA_HOME/repCache= to \texttt{tempRep}
\item Runs \texttt{weka.core.RepoistoryIndexGenerator}
\item rsyncs \texttt{tempRep} to your web server
\end{enumerate}
